Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System

نویسندگان

  • Jürgen Riedler
  • Sergios Katsikas
چکیده

We report on the creation of a Modern Greek broadcast-news corpus as a pre-requisite to build a large-vocabulary continuous-speech recognition system. We discuss lexical modelling with respect to pronuciation generation and examine the effects of the lexicon size on word accuracies. Peculiarities of Modern Greek as a highly inflectional language and their challenges for speech recognition are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

“My Small Slim Greek ASR System” or Automatic Speech Recognition of Modern Greek Broadcast News

In this paper we report on the development of a Modern Greek large-vocabulary continuous-speech recognition system. We discuss lexical modelling with respect to pronuciation generation and examine its effects on word accuracies. Peculiarities of Modern Greek as a highly inflectional language and their challenges for speech recognition are addressed.

متن کامل

Thai Broadcast News Corpus Construction and Evaluation

Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of speech data while the text corpus was...

متن کامل

Spanish broadcast news transcription

We describe the Sail Labs Media Mining System (MMS) aimed at the transcription of Castilian Spanish broadcastnews. In contrast to previous systems, the focus of this system is on Spanish as spoken on the Iberian Peninsula as opposed to the Americas. We discuss the development of a Castilian Spanish broadcast-news corpus suitable for training the various system components of the MMS and report o...

متن کامل

The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work

The aim of the paper is to search for common guidelines for the future development of speech databases for less resourced languages in order to make them the most useful for both main fields of their use, linguistic research and speech technologies. We compare two standards for creating speech databases, one followed when developing the Slovene speech database for automatic speech recognition –...

متن کامل

MATBN: A Mandarin Chinese Broadcast News Corpus

The MATBN Mandarin Chinese broadcast news corpus contains a total of 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. In this paper, we briefly introduce the speech corpus and r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007